现象

版更后, Grafana 无法正常展示 Prometheus 资料。

问题描述

稍微拉长 Grafana 时间轴,Grafana 会显示 query processing would load too many samples into memory in query execution. 报错信息。

报错截图

grafana报错截图

问题分析

因版更后, 可能因业务改变而导致prometheus metric突增。但 Grafana 拉取的数据超过了 Prometheus 限制。 通过查找资源和翻阅 Prometheus GitHub 发现。由于PromQL语句可能会载入大量的 metrics 数据,导致 Prometheus 内存以及 CPU 消耗超标,为了保护 Prometheus 不会被复杂的 PromQL 查询吃掉过多的资源,Prometheus 作者在代码层面设置了限制。在 Prometheus 2.5.0 版本后,Prometheus 新增可以通过–query.max-samples参数来调整限制,满足更多的业务需求。
Prometheus 2.5.0 GitHub 特性说明:
地址:https://github.com/prometheus/prometheus/releases/tag/v2.5.0
Issue 地址:https://github.com/prometheus/prometheus/pull/4513
github

解决办法

GS prometheus在迁移时升级过 Prometheus 版本,故需要查阅对应版本的 Prometheus 源码来了解 Prometheus 默认限制大小。
Prometheus 源码地址:https://github.com/prometheus/prometheus/blob/v2.15.2/cmd/prometheus/main.go
github

通过 Prometheus 源码可知,Prometheus 默认的最大查询上限为:50000000。因 GS Grafana 图查询次数超过 Prometheus 限制,所以,我们需要通过参数来调整上线大小。
参数说明:

1
2
3
4
5
6
7
8
--query.timeout=2m
Maximum time a query may take before being aborted.
--query.max-concurrency=20
Maximum number of queries executed concurrently.
--query.max-samples=50000000
Maximum number of samples a single query can load into memory. Note that queries will fail if they try to load more samples than this into memory, so this also limits the number of samples a query can return.

经过沟通,目前使用的 Gs prometheus 启动参数为:

1
/data1/prometheus_gs_all/prometheus-2.15.2.linux-amd64/prometheus --config.file=/data1/prometheus_gs_all/prometheus-2.15.2.linux-amd64/prometheus.yml --web.listen-address=0.0.0.0:9090 --storage.tsdb.retention=90d --web.enable-lifecycle --web.external-url=http://143.92.123.123:9090 --query.max-samples=500000000 --query.timeout=20m --query.max-concurrency=200

参考文档:
http://blog.kankanan.com/article/query-processing-would-load-too-many-samples-into-memory-in-query-execution.html
https://github.com/prometheus/prometheus/blob/v2.15.2/cmd/prometheus/main.go
https://github.com/prometheus/prometheus/releases/tag/v2.5.0
https://github.com/prometheus/prometheus/pull/4513


本文出自”Jack Wang Blog”:http://www.yfshare.vip/2022/10/27/解决Grafana query processing would load too many samples into memory in query exec/